AITopics | full training

Collaborating Authors

full training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

S-GAI: Spectral Geometry-Aware Initialization for Sigmoidal MLPs -- From Dataset Geometry to Network Weights

Chu, Yi-Shan

arXiv.org Machine LearningJun-30-2026

Classical universal approximation theorems establish the expressive power of sigmoidal multilayer perceptrons, but they do not prescribe how initial weights should encode the geometry of a data distribution. We propose S-GAI, a spectral geometry-aware initialization framework for one-hidden-layer sigmoidal MLPs. Starting from the constructive idea that sigmoid units can act as smooth half-space gates, we move from hand-specified planar geometry to class-wise spectral geometry estimated from image data. For each class, SVD provides a mean, principal directions, and spectral scales. An energy threshold selects the retained directions, and each retained direction is represented by two sigmoid gates. These class-specific gates form a shared hidden layer initialized directly from the training set. We also formulate a SVD-based subspace classifier as a non-neural geometric reference, which tests whether the estimated spectral class geometry is already discriminative before being embedded into the MLP. Experiments on MNIST, Fashion-MNIST, and a more challenging CIFAR-10 test show that the S-GAI-initialized MLP starts from a substantially more informative hidden state than Xavier initialization and reaches comparable final accuracy under full training. When the hidden layer is frozen, training only the output layer still gives stronger performance than frozen random gates, providing evidence that S-GAI effectively embeds class-wise spectral geometry into the MLP.

artificial intelligence, geometry, machine learning, (18 more...)

arXiv.org Machine Learning

2606.28444

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning

Li, Changlin, Zhang, Jiawei, Liu, Shuhao, Lin, Sihao, Shi, Zeyi, Li, Zhihui, Chang, Xiaojun

arXiv.org Artificial IntelligenceNov-27-2025

Human video generation has advanced rapidly with the development of diffusion models, but the high computational cost and substantial memory consumption associated with training these models on high-resolution, multi-frame data pose significant challenges. In this paper, we propose Entropy-Guided Prioritized Progressive Learning (Ent-Prog), an efficient training framework tailored for diffusion models on human video generation. First, we introduce Conditional Entropy Inflation (CEI) to assess the importance of different model components on the target conditional generation task, enabling prioritized training of the most critical components. Second, we introduce an adaptive progressive schedule that adaptively increases computational complexity during training by measuring the convergence efficiency. Ent-Prog reduces both training time and GPU memory consumption while maintaining model performance. Extensive experiments across three datasets, demonstrate the effectiveness of Ent-Prog, achieving up to 2.2$\times$ training speedup and 2.4$\times$ GPU memory reduction without compromising generative performance.

artificial intelligence, machine learning, video generation, (17 more...)

arXiv.org Artificial Intelligence

2511.21136

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

Aryan Mokhtari, Alejandro Ribeiro

Neural Information Processing SystemsNov-21-2025, 11:27:27 GMT

This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically - e.g., scaling by a factor of two - and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme.

artificial intelligence, machine learning, statistical accuracy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Nevada (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Models Got Talent: Identifying High Performing Wearable Human Activity Recognition Models Without Training

Goldman, Richard, Komperla, Varun, Ploetz, Thomas, Haresamudram, Harish

arXiv.org Artificial IntelligenceNov-20-2025

A promising alternative to the computationally expensive Neural Architecture Search (NAS) involves the development of Zero Cost Proxies (ZCPs), which correlate well with trained performance, but can be computed through a single forward/backward pass on a randomly sampled batch of data. In this paper, we investigate the effectiveness of ZCPs for HAR on six benchmark datasets, and demonstrate that they discover network architectures that obtain within 5% of performance attained by full-scale training involving 1500 randomly sampled architectures. This results in substantial computational savings as high-performing architectures can be discovered with minimal training. Our experiments not only introduce ZCPs to sensor-based HAR, but also demonstrate that they are robust to data noise, further showcasing their suitability for practical scenarios.

architecture, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.06157

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering

Yuan, Zhuowen, Liu, Tao, Yang, Yang, Wang, Yang, Qi, Feng, Rangadurai, Kaushik, Li, Bo, Yang, Shuang

arXiv.org Artificial IntelligenceNov-7-2025

Recent LLM-based agents have demonstrated strong capabilities in automated ML engineering. However, they heavily rely on repeated full training runs to evaluate candidate solutions, resulting in significant computational overhead, limited scalability to large search spaces, and slow iteration cycles. To address these challenges, we introduce ArchPilot, a multi-agent system that integrates architecture generation, proxy-based evaluation, and adaptive search into a unified framework. ArchPilot consists of three specialized agents: an orchestration agent that coordinates the search process using a Monte Carlo Tree Search (MCTS)-inspired novel algorithm with a restart mechanism and manages memory of previous candidates; a generation agent that iteratively generates, improves, and debugs candidate architectures; and an evaluation agent that executes proxy training runs, generates and optimizes proxy functions, and aggregates the proxy scores into a fidelity-aware performance metric. This multi-agent collaboration allows ArchPilot to prioritize high-potential candidates with minimal reliance on expensive full training runs, facilitating efficient ML engineering under limited budgets. Experiments on MLE-Bench demonstrate that ArchPilot outperforms SOTA baselines such as AIDE and ML-Master, validating the effectiveness of our multi-agent system.

archpilot, artificial intelligence, evaluation, (15 more...)

arXiv.org Artificial Intelligence

2511.03985

Genre: Research Report (0.82)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Appendix for Efficient Low rank for Vision Transformer Adaptation A More Experimental Results for Full Training in Table 2 Section 4.2

Neural Information Processing SystemsOct-8-2025, 09:35:20 GMT

Table 5 shows more results for training the entire model. Indeed, these results further demonstrate the effectiveness of our LBP-WHT approach.Full Training Model Method R Speedup mAcc MFLOPs CF100 CF10 Cars Flowers Food PetsEfficient Former L1 (Hybrid) Full BP - 1.0 90.61 5841.09 " refers to our LBP-WHT method with "Hybrid" represents CNN-ViT -hybrid architecture. Any results that have higher speed or mAcc are highlighted in bold. On the other hand, LoRA efficiently reduces the memory usage needed to store the weights gradient. These results confirm the effectiveness of our method. " refers to our LBP-WHT method with As shown in Table 7, our method scales well on large scale datasets.

full bp, full training, vision transformer adaptation, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.51)

Add feedback

A Preliminaries on Transformers

Neural Information Processing SystemsAug-17-2025, 04:55:46 GMT

Perplexity is a widely used metric for evaluating the performance of autoregressive language models. This metric encapsulates how well the model can predict a word.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Parsimonious Dataset Construction for Laparoscopic Cholecystectomy Structure Segmentation

Zhou, Yuning, Badgery, Henry, Read, Matthew, Bailey, James, Davey, Catherine

arXiv.org Artificial IntelligenceApr-18-2025

Labeling has always been expensive in the medical context, which has hindered related deep learning application. Our work introduces active learning in surgical video frame selection to construct a high-quality, affordable Laparoscopic Cholecystectomy dataset for semantic segmentation. Active learning allows the Deep Neural Networks (DNNs) learning pipeline to include the dataset construction workflow, which means DNNs trained by existing dataset will identify the most informative data from the newly collected data. At the same time, DNNs' performance and generalization ability improve over time when the newly selected and annotated data are included in the training data. We assessed different data informativeness measurements and found the deep features distances select the most informative data in this task. Our experiments show that with half of the data selected by active learning, the DNNs achieve almost the same performance with 0.4349 mean Intersection over Union (mIoU) compared to the same DNNs trained on the full dataset (0.4374 mIoU) on the critical anatomies and surgical instruments.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.12573

Country: Oceania > Australia > Victoria > Melbourne (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Surgery (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Cho, Yeseul, Shin, Baekrok, Kang, Changmin, Yun, Chulhee

arXiv.org Artificial IntelligenceFeb-9-2025

Advancements in deep learning have been significantly driven by large-scale datasets. However, recent studies have revealed a power-law relationship between the generalization capacity of deep neural networks and the size of their training data (Gordon et al., 2021; Hestness et al., 2017; Rosenfeld et al., 2019), meaning that the improvement of model performance becomes increasingly cost-inefficient as we scale up the dataset size. Fortunately, Sorscher et al. (2022) demonstrate that the power-law scaling of error can be reduced to exponential scaling with Pareto optimal data pruning. The main goal of dataset pruning is to identify and retain the most informative samples while discarding redundant data points for training neural networks. This approach can alleviate storage and computational costs as well as training efficiency. However, many existing pruning methods require training a model with a full dataset over a number of epochs to measure the importance of each sample, which ironically makes the pruning process more expensive than just training the model once on the original large dataset. For instance, several score-based methods (Gordon et al., 2021; He et al., 2024; Pleiss et al., 2020; Toneva et al., 2018; Zhang et al., 2024) require training as they utilize the dynamics from the whole training process. Some geometry-based methods, (Xia et al., 2022; Yang et al., 2024) leverage features from the penultimate layer of the trained model, therefore training a model is Authors contributed equally to this paper.

artificial intelligence, lightweight dataset pruning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2502.06905

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Filters

Collaborating Authors

full training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

S-GAI: Spectral Geometry-Aware Initialization for Sigmoidal MLPs -- From Dataset Geometry to Network Weights

9949e6906be6448230cdba9a4cb2d564-Supplemental-Conference.pdf

Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

Models Got Talent: Identifying High Performing Wearable Human Activity Recognition Models Without Training

ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering

Appendix for Efficient Low rank for Vision Transformer Adaptation A More Experimental Results for Full Training in Table 2 Section 4.2

A Preliminaries on Transformers

Parsimonious Dataset Construction for Laparoscopic Cholecystectomy Structure Segmentation

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty